Inference with Few Heterogeneous Clusters∗
نویسندگان
چکیده
Suppose estimating a model on each of a small number of potentially heterogeneous clusters yields approximately independent, unbiased and Gaussian parameter estimators. We make two contributions in this set-up. First, we show how to compare a ∗The authors would like to thank three anonymous referees and seminar and conference participants at various universities and conferences for helpful comments. Ibragimov gratefully acknowledges partial support via NSF grant SES-0820124 and grants from the GDN-SEE and CIS Research Competition, the Russian Ministry of Education and Science (Innopolis University) and the Russian Government Program of Competitive Growth of Kazan Federal University (Higher Institute of Information Technologies and Information Systems), and Müller gratefully acknowledges support by the NSF via grant SES-0518036. We are indebted to Nail Bakirov (1952-2010) and Daniyar Mushtari (1945-2013) for inspiring discussions, comments and attention to our work. We are also thankful to Aprajit Mahajan for sharing data and useful discussions, and to Chenchuan (Mark) Li for outstanding research assistance. scalar parameter of interest between treatment and control units using a two-sample t-statistic, extending previous results for the one-sample t-statistic. Second, we develop a test for the appropriate level of clustering, which tests the null hypothesis that clustered standard errors from a much finer partition are correct. We illustrate the approach by revisiting empirical studies involving clustered, time series and spatially correlated data. JEL classification: C12, C14, C32
منابع مشابه
Inference with Few Heterogenous Clusters
Consider inference with a small number of potentially heterogeneous clusters. Suppose estimating the model on each cluster yields q asymptotically unbiased, independent Gaussian estimators with potentially heterogeneous variances. Following Ibragimov and Müller (2010), one can then conduct asymptotically valid inference with a standard t-test based on the q cluster estimators, since at conventi...
متن کاملViral Clustering:
Cluster validation constitutes one of the most challenging problems in unsupervised cluster analysis. For example, identifying the true number of clusters present in a dataset has been investigated for decades, and is still puzzling researchers today. The difficulty stems from the high variety of the dataset characteristics. Some datasets exhibit a strong structure with a few wellseparated and ...
متن کاملViral Clustering: A Robust Method to Extract Structures in Heterogeneous Datasets
Cluster validation constitutes one of the most challenging problems in unsupervised cluster analysis. For example, identifying the true number of clusters present in a dataset has been investigated for decades, and is still puzzling researchers today. The difficulty stems from the high variety of the dataset characteristics. Some datasets exhibit a strong structure with a few well-separated and...
متن کاملInference with Correlated Clusters
This paper introduces a method which permits valid inference given a finite number of heterogeneous, correlated clusters. It is common in empirical analysis to use inference methods which assume that each unit is independent. Panel data permit this assumption to be relaxed as it is possible to estimate the correlations across clusters and isolate the independent variation in each cluster for pr...
متن کاملCluster-robust Bootstrap Inference in Quantile Regression Models
In this paper I develop a wild bootstrap procedure for cluster-robust inference in linear quantile regression models. I show that the bootstrap leads to asymptotically valid inference on the entire quantile regression process in a setting with a large number of small, heterogeneous clusters and provides consistent estimates of the asymptotic covariance function of that process. The proposed boo...
متن کاملRandomization Inference for Differences-in- Differences with Few Treated Clusters
Inference using di erence-in-di erences with clustered data requires care. Previous research has shown that, when there are few treated clusters, t tests based on a clusterrobust variance estimator (CRVE) severely over-reject, di erent variants of the wild cluster bootstrap can over-reject or under-reject dramatically, and procedures based on randomization inference show promise. We demonstrate...
متن کامل